Spatial understanding is a fundamental aspect of computer vision and integral for human-level reasoning about images, making it an important component for grounded language understanding. While recent large-scale text-to-image synthesis (T2I) models have shown unprecedented improvements in photorealism, it is unclear whether they have reliable spatial understanding capabilities. We investigate the ability of T2I models to generate correct spatial relationships among objects and present VISOR, an evaluation metric that captures how accurately the spatial relationship described in text is generated in the image. To benchmark existing models, we introduce a large-scale challenge dataset SR2D that contains sentences describing two objects and the spatial relationship between them. We construct and harness an automated evaluation pipeline that employs computer vision to recognize objects and their spatial relationships, and we employ it in a large-scale evaluation of T2I models. Our experiments reveal a surprising finding that, although recent state-of-the-art T2I models exhibit high image quality, they are severely limited in their ability to generate multiple objects or the specified spatial relations such as left/right/above/below. Our analyses demonstrate several biases and artifacts of T2I models such as the difficulty with generating multiple objects, a bias towards generating the first object mentioned, spatially inconsistent outputs for equivalent relationships, and a correlation between object co-occurrence and spatial understanding capabilities. We conduct a human study that shows the alignment between VISOR and human judgment about spatial understanding. We offer the SR2D dataset and the VISOR metric to the community in support of T2I spatial reasoning research.
translated by 谷歌翻译
有毒语言检测系统通常会错误地将包含少数群体群体提及的毒性的错误标记文本,因为这些群体通常是在线仇恨的目标。这种对虚假相关性的过度依赖也导致系统在检测隐式有毒语言方面挣扎。为了帮助缓解这些问题,我们创建了Toxigen,这是一个新的大规模和机器生成的数据集,该数据集是274K有毒和良性陈述,约有13个少数群体。我们开发了一个基于示范的提示框架和一种对抗性分类器的解码方法,以使用大量预处理的语言模型生成微妙的有毒和良性文本。以这种方式控制机器的生成使毒素可以比以前的人写文本的资源更大的规模和大约人口组覆盖隐式有毒文本。我们对毒素的一个充满挑战的子集进行人体评估,发现注释者难以区分机器生成的文本和人类写的语言。我们还发现,94.5%的有毒例子被人类注释者标记为仇恨言论。我们使用三个公开可用的数据集,我们表明,对我们的数据进行毒性分类器的填充可以大大提高其在人体编写数据上的性能。我们还证明,毒素可用于抵抗机器生成的毒性,因为鉴定在我们的评估子集中大大改善了分类器。我们的代码和数据可以在https://github.com/microsoft/toxigen上找到。
translated by 谷歌翻译
对AI系统的分类评估,其中系统性能分别为不同的人分别评估和报告,在概念上简单。然而,他们的设计涉及各种选择。其中一些选择会影响将获得的结果,从而产生可以绘制的结论;其他人影响了有益和有害的影响 - 将分列的评估将对人们进行分类,包括其数据用于进行评估的人员。我们认为,更深入的了解这些选择将使研究人员和从业者能够设计仔细和决定性的分类评估。我们还争辩说,更好地记录这些选择,以及所做的潜在考虑因素和权衡,将在解释评估的结果和结论时帮助别人。
translated by 谷歌翻译
Many modern research fields increasingly rely on collecting and analysing massive, often unstructured, and unwieldy datasets. Consequently, there is growing interest in machine learning and artificial intelligence applications that can harness this `data deluge'. This broad nontechnical overview provides a gentle introduction to machine learning with a specific focus on medical and biological applications. We explain the common types of machine learning algorithms and typical tasks that can be solved, illustrating the basics with concrete examples from healthcare. Lastly, we provide an outlook on open challenges, limitations, and potential impacts of machine-learning-powered medicine.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
在这项研究中,我们解决了增强学习中有效探索的问题。最常见的探索方法取决于随机行动的选择,但是这些方法在稀疏或没有奖励的环境中无法很好地工作。我们提出了基于生成的对抗网络的固有奖励模块,该模块了解观察到的状态的分布并发送一个内在的奖励,该奖励是为无法分配的状态而计算出的,以使代理人领导未开发的状态。我们在超级马里奥兄弟(Super Mario Bros)中评估了我们的方法,以获取无奖励的环境,并在蒙特祖玛(Montezuma)的报仇中为稀疏的奖励设置进行了报复,并表明我们的方法确实能够有效地探索。我们讨论了一些弱点,并通过讨论未来的作品来得出结论。
translated by 谷歌翻译
草图是视觉感知和粘合性建设的抽象表示。在这项工作中,我们提出了一个新的框架GaN-CNMP,它在CNMP上含有新的对抗性损失,以提高草图平滑度和一致性。通过实验,我们表明我们的模型可以训练,其中少量未标记的样本,可以在潜伏空间中自动构建分布,并在形状一致性和平滑方面产生比基础模型更好的结果。
translated by 谷歌翻译
在联合学习中,每个参与者通过自己的数据列举其本地模型,并通过聚合来自这些参与者的模型更新来在可信服务器上形成全局模型。由于服务器对参与者的培训程序没有影响和可见性以确保隐私,因此全球模型变得容易受到数据中毒和模型中毒等攻击的影响。虽然最近已经提出了许多防御算法来解决这些攻击,但它们往往会使强烈的假设与联邦学习的性质相吻,例如非IID数据集。此外,它们大多缺乏全面的实验分析。在这项工作中,我们提出了一种称为Barfed的防御算法,不会对数据分布,更新参与者的相似性或恶意参与者的比率作出任何假设。 Barfed主要考虑基于与全局模型的距离的模型架构的每个层的参与者更新的异常状态。因此,没有任何异常层的参与者都参与了模型聚合。我们在许多场所进行广泛的实验,并表明该方法为不同攻击提供了强大的防御。
translated by 谷歌翻译